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Abstract 

We initiate a study of when the value of mathematical relaxations such as linear and semi-definite 
programs for constraint satisfaction problems (CSPs) is approximately preserved when restricting the 
instance to a sub-instance induced by a small random subsample of the variables. 

Let C be a family of CSPs such as 3SAT, Max-Cut, etc., and let n be a mathematical program that is a 
relaxation for C, in the sense that for every instance P eC, ll(P) is a number in [0, 1] upper bounding the 
maximum fraction of satisfiable constraints of P. Loosely speaking, we say that subsampling holds for C 
and n if for every sufficiently dense instance P eC and every e > 0, if we let P' be the instance obtained 
by restricting f to a sufficiently large constant number of variables, then U{P') e (1 ± s)U{P). We say that 
weak subsampling holds if the above guarantee is replaced with Il(P') - I- @(y) whenever Tl(P) = 1 - y, 
where hides only absolute constants. We obtain both positive and negative results, showing ffiat: 

1. Subsampling holds for ffie BasicLP and BasicSDP programs. BasicSDP is a variant of the semi- 
definite program considered by Raghavendra (2008), who showed it gives an optimal approximation 
factor for every constraint-satisfaction problem under the unique games conjecture. BasicLP is the 
linear programming analog of BasicSDP. 

2. For tighter versions of BasicSDP obtained by adding additional constraints from the Lasserre 
hierarchy, weak subsampling holds for CSPs of unique games type. 

3. There are non-unique CSPs for which even weak subsampling fails for the above tighter semi- 
definite programs. Also there are unique CSPs for which (even weak) subsampling fails for the 
Sherali-Adams linear programming hierarchy. 

As a corollary of our weak subsampling for strong semi-definite programs, we obtain a polynomial- 
time algorithm to certify that random geometric graphs (of the type considered by Feige and Schechtman, 
2002) of max-cut value 1 - y have a cut value at most 1 - y/lO. More generally, our results give an 
approach to obtaining average-case algorithms for CSPs using semi-definite programming hierarchies. 
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1 Introduction 



In this paper we consider the following seemingly unrelated questions: 

1. Is the Max Cut problem hard on random geometric graphs of the type considered by Feige and 
Schechtman [FS02]? 

2. Is the value of a mathematical relaxation for a constraint-satisfaction problem (CSP) preserved when 
one passes from an instance P to a random induced sub-formula of PI 

It turns out that (in a sense made precise below) the answer to the first question is "no" and in fact this is 
intimately related to the second question. The answer to the second question is much more subtle, and, in 
contrast to the case of the objective value^ of the CSP, the answer strongly depends on the type of relaxation 
and CSP. 

1.1 Max Cut on the sphere 

Max Cut — the problem of finding a cut maximizing the number of cut edges — is a widely studied 
optimization problem, important both in its own right, and as a testbed for techniques in algorithms and 
hardness of approximation. The best approximation algorithm for Max Cut known today is the semi-definite 
program GW SDP of Goemans and Williamson [GW94], which is optimal in the worst-case under the 
unique games conjecture [KKMO04, MOO05]. GW SDP is a special case of the BasicSDP algorithm for 
CSPs considered by Raghavendra [Rag08], who showed that the latter algorithm always has an optimal 
approximation factor in the worst-case under the unique games conjecture. 

In particular GW SDP gives a value of at most 1 - Q(e^) when given as input a graph whose maximum 
cut cuts 1 - e fraction of the edges.^ In this work we study the average-case complexity of Max Cut — namely 
whether one can do better on natural distributions over the instances. Since random graphs are expanders 
and so obviously have a maximum cut value close to 1/2 (and moreover this fact can be efficiently certified 
using the second eigenvalue), one needs to consider other distributions over the inputs. We consider random 
geometric graphs, that in light of known results, arguably constitute the most natural distribution of Max Cut 
instances that is not obviously easy. 

Random geometric graphs. A random geometric graph is obtained by taking the vertices as random 
unit vectors in R'', and connecting two vertices u,v e based on their distance \\u - v\\^_. We consider 
the distribution 0n,d,y, where the vertices are n random unit vectors in R'', and we connect two vectors 
if \\u - v\\2 > 2 -y/l - y. By construction, GW SDP will have value 1 - y on these graphs, but, as shown 
by [FS02], as long as n is not too small these graphs will have with high probability a maximum cut value 
of 1 - c ^Jy for some absolute constant c. Moreover, as we observe here, for a suitable choice of n, these 
graphs will also be hard instances for the Sherali-Adams [SA90] linear programming hierarchies; these are 
generally incomparable with GW SDP and have been shown to solve Max Cut on dense graphs [dlVKMOV]. 
Nevertheless, we show here that these graphs can be certified to have small max cut in polynomial time. (A 
certification algorithm that the max-cut of a random graph from a distribution is at most v is an algorithm 
whose output always upper bounds the max-cut, and with high probability the output is at most v.) 

' A ^:-CSP is a collection V of functions mapping n variables from some finite alphabet to jO, 1), such that every P ef depends on 
at most k variables. We define the objective value of a CSP to be the maximum of ^ taken over all possible assignments 

X to the variables. 

^As a relaxation for a maximization problem, the value of GW SDP is always at least as large as the integral objective value. 
Hence the fact that the relaxation outputs some value v for an instance G is a certification that the maximum cut of G is at most v. 
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Informal Theorem 1 (Max Cut on random geometric graphs, see Theorem 5.5). There is a polynomial-time 
algorithm that certifies that a random graph G from Qn4,y satisfies Max Cut(G) < 1 - Q( ^Jy), for every 
7 € (0, 1), € N and n > C{y)/p(y, d), where Max Cut(G) denotes the fraction of edges cut by the maximum 
cut in G, C{y) is some constant depending only on y, and iu{y, d) denotes the normalized measure in the unit 
sphere of the ball of radius ^J2y around some unit vector 

By a simple calculation one can show that the probability that two random unit vectors u, v in will 
satisfy ||m - uHj > 2 ^Jl - y is exactly p{y, d), implying that if n «; 1 l^i{y, d) the graph @n,d,y will have average 
degree «; 1 (and hence has a trivial large max cut). Thus the value of n that Theorem 1 applies to is at most a 
constant factor larger than the minimum possible. The algorithm A of Theorem 1 is simply a tightening of 
the relaxation GW SDP obtained by adding the so-called "triangle inequalities" to that program. 

1.2 Subsampling mathematical relaxations 

The other question we consider is whether the value of mathematical relaxations such as linear and semi- 
definite programming is preserved under subsampling. That is, given a CSP instance <p onn variables, we 
consider the instance (/>' obtained by choosing at random S c [«] of some specified size, and keeping only the 
constraints involving only variables in S . We ask in what cases the value of the relaxation of (p' is close to the 
value of 4>. 

This question is a variant of property testing [RonOO, Rub06] that we believe is interesting in its own 
right. It also has algorithmic applications. Subsampling gives a fast way to "sketch" a CSP in a way that 
preserves the the objective value but using a much smaller instance size. But since we generally cannot 
compute this objective value in the worst case, we'd want to make sure that if (p was an "easy instance" for 
our algorithm, then 0' will be such an instance as well. A subsampling theorem for mathematical relaxations 
guarantees this property. 

Subsampling for the objective value of constraint satisfaction problem (namely the fraction of satisfied 
constraints) was studied before by Goldreich, Goldwasser and Ron [GGR98] who gave a subsampling 
theorem for Max Cut, and by Alon, de la Vega, Kannan and Karpinski [AdlVKK03] who gave a subsampling 
theorem for general CSPs. But, to our knowledge, subsampling for mathematical relaxations was not studied 
before. As we show, unlike the case of the objective value, subsampling sometimes fails for the value of 
relaxations, and this depends on the particular relaxation and CSP. 

Another, more minor difference between prior works and ours is that while prior works focused on the 
dense case, considering /:-CSPs with Q(«'^) constraints, we consider general, possibly non dense, CSPs, and 
wish to optimize the trade-off" between the sample size and density. We say that a 2-CSP is A-dense if every 
variable appears in at least A constraints, and use a suitable generalization of this notion to /:-CSPs (see 
Section 4). We show a subsampling theorem for the objective value of A-dense CSPs with the optimal sample 
of size 0{n/A). Namely, we show that the value of the induced instanced is equal to the value of the original 
instance up to 1 ± e multiplicative factor, where O notation in the sample size hides polynomial factors in 
1 /e. The only prior work to consider this trade-off" was by Feige and Schechtman [FS02] , who gave such a 
result for Max Cut with 0{n log n/A) sample size. 

Our results for subsampling mathematical relaxations of CSPs are the following (see Section 4. 1 and 7 for 
formal statements). In all cases we consider a A-dense CSP P and a subformula of P' induced on a random 
subset of poly(l /E)(n/A) variables, and we let Il{P) be the value of the relaxation IT on P.^ 

'For the positive results, our sample size is as small as possible; the negative results hold also for much larger sample size and in 
particular show that one cannot get a constant size subset even if A = Q.(n), see Section 6. 
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We start by showing that subsamphng holds for BasicSDP and BasicLP, where BasicSDP is the semi- 
definite program considered by Raghavendra [Rag08] and BasicLP is its linear programming analog.^ 

Informal Theorem 2 (Subsamphng for BasicSDP and BasicLP, see Section 4.1). In the notation above, for 
any CSP P and for U that is either BasicSDP or BasicLP, 

ucP) - £ < ncp') < ncp) + e 

We then show that for stronger SDPs, we still have weak subsamphng if the CSP is a unique game. 

Informal Theorem 3 (Subsamphng for unique games, see Theorem 7.2). In the notation above, iff is a 
unique game, then for every e N, letting y = 1 - BasicSDPyt(!P), 

1 - y - £ < BasicSDPi(!P') < 1 - y/9 + £ 

where BasicSDP^; denotes BasicSDP augmented with k rounds of the Lasserre hierarchy. 

Theorem 3 is the main technical contribution of this paper, and also the one used to obtain our algorithm 
for Max Cut on random geometric graphs. We also have negative results that complement our positive results 
and show that, in contrast to the case of the objective value, subsamphng sometimes fails for mathematical 
relaxations. 

Informal Theorem 4 (Negative results for subsampling, see Theorems 6.1 and 6.2). There is a (non 
unique) CSP V and absolute constant 5 > Ofor which BasicSDPo(i)(!P) < 1 - <5 but with high probability 
BasicSDP y^(!P') ^ 1 - o(l). There is a unique CSP f and absolute constant 5 > Ofor which BasicLP3(;P) < 
1-6 but with high probability BasicLP^(i)(!P) > 1 - o(l), where BasicLP^; denotes BasicLP augmented with 
k rounds of the Sherali-Adams hierarchy, and o(l) (resp. aj(l)) denotes a function that tends to (resp. ooj 
with n. 

See Figure 1 for an overview of our positive and negative results on subsampling mathematical relaxations. 
As one can see, we cover most of the cases, with the most interesting (in our opinion) open question is 
whether strong semidefinite programs for unique CSPs such as Max Cut actually have strong subsampling, 
in the sense that the value of the program on a subsample approximates the value on the original instance 
with arbitrary accuracy. We suspect that the answer is "no", though have no proof of that. 

1.3 Subsampling SDPs and average-case complexity. 

As mentioned above, we use Theorem 3 (weak subsampling theorem for strong SDPs on unique CSPs) 
to show Theorem 1 — the Max Cut algorithm for random geometric graphs. Theorem 1 is obtained from 
our subsampling theorem as follows: we first show that BasicSDP3(Gf;,y) < 1 - n( ^fy) where BasicSDP3 
denotes the BasicSDP program augmented with the triangle inequalities, and G^^y is the graph on the 
continuous (i-dimensional sphere where we connect two unit vectors u, v if \\u - v\\2 > 2 ^\ - y. (Equivalently, 
one can think of Gd,y as a random graph from Qn4,y where d, y are fixed and n tends to infinity.) We show 
this by observing that the edges of G can be partitioned into essentially disjoint union of odd cycles of length 
0(1/ yfy), and noting that triangle inequalities can capture the fact that one cannot cut all the edges of an odd 
cycle. Since random geometric graphs are simply subsamples of Gd,y, our subsampling theorem implies that 
BasicSDPs will have value in 1 - 0( ^y) for these graphs. 

"•We actually use the "smoothed" version of Raghavendra' s SDP considered in [RS09, StelO] — see Section 4.1. The two 
programs are closely related, and [Rag08]'s result holds for the smoothed version as well. 
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Basic SDP/ 



Lasserre SDP^^ 



unique: weak subsamplc 




Basic LP/ 
Sherali Adams LPx 



Integer Progam / Objective Value /[AdlVKKOS] 



Figure 1: Overview of subsampling results, /denotes a subsampling theorem, while X denotes that subsampling fails. 
Arrows point from weaker to tighter relaxations. 



This algorithm is an instance of a general recipe for using our subsampling theorem for average-case 
algorithms. Many natural distributions can be thought of as random subsamples of some instance (or family of 
instances) cp (e.g., random graphs are subsamples of a random dense graph, random 3SAT are subsamples of 
a random dense formula). In such cases, if one can give a relaxation that gives a tight value on (p (perhaps by 
exploiting its density) and the relaxation admits subsampling, then it follows that the relaxation succeeds on 
the distribution of subsamples as well. In our case, even though sufficiently many rounds on Sherali-Adams 
hierarchy give a tight value on Gdy (since, considering y as constant, it is dense), we cannot use those directly 
as they do not admit subsampling. Similarly, even though BasicSDP admits subsampling, it does not yield a 
tight value on dense 3SAT formulas, which is the reason our results do not refute Feige's hypothesis [Fei02] 
on the hardness of certifying that random 3SAT formulas are unsatisfiable. 

We note that subsampling theorems have been used before for approximation algorithms for CSPs, but 
in a different way. Prior works used subsampling of the objective value to show worst-case approximation 
algorithms for dense graphs, by showing that one can first subsample to constant size and then solve the 
problem using brute force on the sample [AdlVKK03] (or use that argument to show that linear programming 
hierarchies will succeed on the original instance [dlVKMOV]). In contrast we use subsampling of the 
relaxation value to give average-case algorithms on some specific distributions of (possibly sparse) graphs. 
Our result is also one of the few examples where higher order SDPs can succeed in an algorithmic task in 
which BasicSDP fails. As mentioned above, if the unique games conjecture is true, then BasicSDP is an 
optimal worst-case approximation algorithm for CSPs, though of course it can be worse than other efficient 
algorithms on some (distributions of) inputs. 

1.4 Related work 

As mentioned above, there has been many works on estimating graph parameters from random small induced 
subgraphs of dense graphs. Goldreich, Goldwasser and Ron [GGR98] show that the the Max-Cut value of a 
dense graph (degree ^{n)) is preserved by subsampling. (In this and other results, the constants depend on 
the quality of estimation.) Feige and Schechtman [FS02] showed that the result holds generally for A-dense 
graphs so long as the degree A > n(log n) and the subgraph is of size at least ^{n log «/A). (As a corollary of 
our results, we slightly strengthen [FS02]'s bounds to hold for any A > and subgraph size larger than 
Q(?i/A).) Alon et al [AdlVKK03] generalize [GGR98] for ^-CSP's and improve their quantitative estimates. 
See also [RV07] for further quantitative improvements in the case of ^ = 2. 

There has also been much work on matrix and graph sparsification by means other than uniform sampling, 
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see for instance [ST04, AHK06, AM07, SS08, BSS09]. Indeed, spectral sparsifiers are stronger than the 
notion we consider, in the sense that passing to a spectral sparsifier will preserve the SDP value for, say. Max 
Cut. Algorithmically though, if one only wants to preserve the SDP value, there are some advantages to 
subsampling, as it reduces not just the number of edges but also the number of vertices, hence potentially 
yielding sublinear algorithms, and can also be carried out very efficiently by just random sampling, reducing 
to a subgraph of constant degree. In contrast constant degree spectral sparsification [BSS09] cannot be 
achieved by sampling vertices (or even edges for that matter) uniformly at random, even for regular graphs. 

2 Overview of proofs 

In this section we give a high level overview of our proofs, focusing on our main result — Theorem 3 showing 
a weak subsampling theorem for strong semidefinite programs for unique games. A ^CSP f on an alphabet 
{q\ is a collection of local functions (called "constraints") from {q\" [0, 1], where for x e {q\" we denote 
"Pi^) = ]^ Upep P{^)- If is a set of variables, then !P[?7] denotes the restriction of P to those constraints that 
depend on variables in U. We'll let f denote 'P[?7] where ?7 is a random subset of size set to an appropriate 
parameter (that we ignore in this overview). 

2.1 Subsampling for fc-CSPs and BasicSDP 

Alon, de la Vega, Kannan and Karpinski [AdlVKK03] proved a subsampling theorem for fc-CSP. As a first 
step, we extend their results to hold with a better dependency between the sample size and density, and 
to hold for constraints that can output a real number, say in [0, 1], rather than just a Boolean value. The 
latter extension is trivial, but the former (which we need for our Max Cut application) requires some work, 
adapting and refining techniques of [GGR98, FS02, dlVKMOV]. Our subsampling theorem for (generalized) 
/c-CSPs is stated in Section 4 and proven in Section 8. 

Subsampling for BasicSDP. BasicSDP is the semi-definite program for ^-CSPs considered by Raghaven- 
dra [Rag08], who showed that it gives an optimal approximation ratio in the worst-case under the unique 
games conjecture. For a given kCSV f over alphabet [q\ this program assigns a vector v^i for every variable 
Xi and alphabet symbol a e [q] of P. It also assigns q'^ numbers lUp^k,. . . ,iJ.pqk for every constraint P of "P. It 
makes the following consistency requirement on {vi^a) and {//p„v) — the inner product of t;,- ^ and Vj^b should 
match the probability of the event "x,- - a AND Xj = b" in any local distribution jup involving both variables 
Xi and Xj (this can be captured by linear and semi-definite conditions). The value of the CSP is simply the 
expectation of P{x) over a random constraint P and a random partial assignment x chosen from jup. (To avoid 
the potential issue of the SDP being extremely sensitive to few of the constraints, we follow [RS09, StelO] in 
allowing a bit of slackness in the consistency constraints on jup.) 

Our subsampling theorem for BasicSDP, proven in Section 4.1, follows from the general subsampling for 
^-CSPs. The idea is to combine two observations: (1) because the assignment to the vectors {y, q) determines 
the best choice for the local distribution, it is possible to write BasicSDP as a program that has no constraints 
and needs to maximize a sum of local functions over these vectors, (2) one can use dimension reduction to 
assume that the vectors have constant dimension with little loss of accuracy [RS09]. Thus by discretizing this 
constant dimensional space, we can think of BasicSDP as itself a CSP over some constant sized alphabet, 
and apply our ^-CSP subsampling theorem to this CSP. A similar (even simpler) reasoning applies to the 
linear programming variant BasicLP, and also to quadratic programs, in particular implying a variant of 
property testing for positive semi-definiteness, see Section 4. 
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2.2 Weak subsampling for strong SDPs 

We now give a high level overview of the proof of Theorem 3. Because stronger SDPs such as those from the 
Lasserre hierarchy actually involve constraints including several vectors, they cannot be expressed as a CSP 
in the same way as BasicSDP. Indeed, we have negative results showing that subsampling can fail for these 
SDPs (see Sections 2.3 and 6). 

The result actually does not depend on the particular properties of the Lasserre hierarchy, and holds for a 
very general class of strong SDPs. We start by formalizing this class. Any strong SDP can be thought of as 
the program BasicSDP augmented with the constraint that the positive semi-definite matrix X of the inner 
products of all these vectors is in some convex set M. But one needs the set M to satisfy some "niceness 
conditions" in order for it to make sense to apply the program on a subsampled CSP. The niceness conditions 
we consider are rather mild, and require that solutions remain valid under renaming and identifying of vertices 
(see Section 7). In particular they apply to any SDP obtained by adding a number of Lasserre rounds to 
BasicSDP. 

If n is any strong SDP, "P is a CSP, and f" is a subsample of f, then it's not hard to show that with high 
probability IICP') ^ WP) - e, since this only needs the argument that the value of one solution (the optimal 
one for f) will be approximately preserved. The challenging task is to show that n(!P') is not much larger 
than n(!P), and because subsampling does not always hold for SDPs, we know that the proof for subsampling 
of /j-CSPs does not generalize to this case. 

The crucial notion we use is of that of a proxy CSP. Let and be two unique games on the same 
alphabet and number of variables, we say that 'H is proxy for (with respect to the program 11), if for every 
assignment X (even possibly outside M) to the vectors of n, 1 - U{g)[X] < 1 - nCK)[X]/10, where n(;P)[X] 
denotes the value of the program 11 on the CSP P with assignment X to the vectors of 11.^ That is, one can 
think of as pointwise dominating with respect to the program IT. We then show that this domination 
condition is somewhat preserved under subsampling, at least for the optimal solutions. That is, we show that 
with high probability 1 - < 1 - ITCKO/IO + e, where Q' and 'H' are the subsampled versions of Q and 

'H'. The idea here is to use our subsampling theorem for SDP looking at the SDP maxx n('K)[X]/10 - n(^). 
This is a basic SDP since it places no constraints on X, and so since we know its optimum is at most 0, this 
should be approximately preserved under subsampling. 

The above discussion shows that to prove Theorem 3 it suffices to find some unique game such that 
(*) *?/ is a proxy for Q and (**) with high probability 1 - ITCK') < 1 - n(^) + e. This is what we do. The 
proxy game is simply the game obtained by taking all length-3 paths in the constraint graph of and 
composing the corresponding permutations. Condition (*) is not that hard to show. Intuitively, an assignment 
that satisfies 1 - y fraction of the constraints of Q should satisfy about 1-37 fraction of the constraints of "K, 
(since each one is just three constraints of Q) and this reasoning carries over to SDP assignments as well. 

Condition (**) looks suspiciously close to what we're trying to prove in the first place (preservation 
of value under subsampling), but note the asymmetry — we need to show that a subsample of will have 
roughly the same value as the original graph Q. It turns out this will actually help us. What we need to show 
is a way to decode an assignment for the SDP of the subsampled game 'H' into an assignment of roughly the 
same value for the SDP of the original game For simplicity, assume that the alphabet of the CSP is {0, 1 ) 
in which case the vector assignment is just one vector per variable.^ Suppose that Q has n variables, each 

'The actual domination condition we use will restrict the possible vector assignments based on the norms of the vectors, but 
because we restrict the vectors to a product set, it does not make a difference in our arguments. 

^Although in the phrasing above it seems that one would need two vectors per variable for alphabet of size 2, it is known how to 
transform the SDP into an equivalent program needing only one vector per variable in this case. 
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participating in A constraints, and we subsample to a set S of size n' = 0(n/A) variables.' We are given a 
vector assignment {v'-,}i'es for each of the n' variables in the sample that gives value t for nCTY'), and need 
to "decode" it into an assignment |y,),(=[„] that gives value roughly r for n(^). We will use a randomized 
decoding, assigning for every variable / of Q the vector i;,' where /' is a random neighbor of / in Q that is 
contained in the sample S Let (/', /, j, /) be the length-3 path corresponding to a random constraint of 'H' 
that survived the subsampling. That is, j' € S . If the subsampled graph is (approximately) regular, we can 
choose (/', /, j, f) in the following way: first let (/, j) be variables corresponding to a random constraint in Q, 
then take /' to be a random neighbor of / that is also in S , and take / to be a random neighbor of j that is 
also in S . We know that on average the vectors v'-, and v'-, contribute t to the value of n("7/')- But then on 
expectation the contribution to n(^) of the decoded vectors Vi and vj is also r, since d,- is exactly obtained by 
taking v'., for a random neighbor /' € 5 of /, and Vj is obtained by taking v'j, for a random neighbor / € 5 of j. 
This concludes the proof. We remark that this reasoning is somewhat reminiscent of Dinur's analysis of her 
gap amplification lemma for PCP's [DinOV]. 

2.3 Negative results for subsampling 

We now briefly sketch why, unlike the case for ^CSPs, subsampling sometimes /a//^ for strong semidefinite 
and linear programs — see Section 6 for more details. The idea is simple: many integrality gaps examples, for 
both LP hierarchy and SDP's, are actually obtained from random instances. Examples include Schoenebeck's 
result [Sch08] showing random 3SAT is an integrality gap example for the Lasserre SDP hierarchy, and 
results showing that random graphs (and more generally good expanders) are integrality gap examples for 
linear programming hierarchies for Max Cut [dlVKMOV, CMM09]. Such random instances can be thought 
of as subsampling of sufficiently dense instance. But sufficiently strong SDP or LP programs will succeed in 
certifying that a dense instance has small value. Thus these integrality gaps give example of a CSP f where 
Yiif) is small, where n is a sufficiently strong linear or semidefinite program, but IICP') is close to 1 for a 
random induced sub-instance f of f. Note that indeed for unique games random graphs are actually easy 
for semi-definite programs [AKK^OS], explaining perhaps why subsampling for unique games is possible for 
semi-definite programs but not for linear programs. 

3 Preliminaries 

Let G be a A-regular graph with vertex set V - \n\ and edge set E (no parallel edges or self-loops). We give 
weight 2/An to each edge of G so that every vertex of G has (weighted) degree Ijn and G has total edge weight 
1 . We say a graph is normalized if it has total edge weight 1 . (We choose this normalization, because we 
will often think of a graph as a probability distribution over unordered vertex pairs.) For a graph G as above 
and a vertex subset U QV, let G[?7] denote the induced subgraph on U . To preserve our normalization, we 
scale the weights of the edges of G[?7] such that the total edge weight mG[U] remains 1. We denote by a 
random subset of a 5 fraction of the vertices in V, and hence G\V5\ denotes a random induced subgraph of G 
of size 5\V\. With our normalization, the typical weight of an edge in G[Vs\ is 2/52 a„. 

^Note that, ignoring constant factors, Q has roughly «A constraints, Q' has n/A constraints, H has nA^ constraints, and "H' has 
nA constraints — the latter fact is some indication why one may hope to decode an assignment to W into an assignment to Q. 

^Our "niceness conditions" will ensure that if the inner-product matrix of the original assignment was in M then the same 
will hold for the decoded assignment. Also we will flip the vector if the corresponding permutation on {0, 1) was a i-> -a, but in 
the discussion below as assume that all permutations involved are the identity — this simplifies notation and is immaterial to this 
argument. 
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Max ^-CSPs. A A:-CSP instance !P is a set of predicates (or pay-ofF functions) of the form P : {q\" R, 
where every P = P{xi^ ,Xi^, . . . , x^) is a /c-junta, meaning it depends only on k of the n variables in x. We'll 
think of Yai{P) = (i^,. . . , iu) as an ordered set and denote the r-th variable by VWr{P) - i,.. Without loss of 
generality we may assume that in each predicate P eP, all k variables are distinct. The nonn of a pay-off 

def 

function is defined as \P\ = maxvg[^]n \P{x)\ , and we put \P\ = Y^Pep \P\- 

We think of P itself as a mapping "P: [q]" R defined as fix) T^Pep PM ■ The optimum is 

denoted opt(!P) = max;fg[^]'i P{x). We will typically assume that \P\ < 1 for all P € !P in which case opt(!P) < 1 . 
For a subset U c [n], with \U\ = 5n, we let Pu denote the k-CSV 9u = {5'''P : PeP, YaiiP) cU}.ln this 
case, we define Pu{x) = ]^ T^pepu P{^) ■ 

Unique Games. A unique game Q is given by a constraint graph G = (V, E), an alphabet [R] and constraints 
;r„<_„ for each edge e = (u, v) € E. An assignment x € [/?]" satisfies the edge e if 7ri,^„(-^«) = ^v- It will be 
convenient for us to define unique games as a minimization problem in which the objective is to minimize the 
number of unsatisfied constraints. Note that throughout the introduction Unique Games was a maximization 
problem, but these two views are equivalent. As a minimization problem unique game has the following SDP 
relaxation (which is closely related to BasicSDP program mentioned before): 

sdp(^) =^min E E - (1) 

{u,v)eEae[R] 

subject to the constraints that Y,ae[R] W^al?' - 1 for every u € V and (ua, uy) - for all m e V and a b. 
An SDP solution is a positive semidefinite {V x [/?]) x (V x [/?]) matrix written as {Xua,vh)u,vev,a,he[R] so that 
Xua,vh = {ua, ^h)- Wc wiU dcnotc by AI2 the set of such matrices that satisfy the three constraints above. We 
will write sdp(^)[X] to denote the value of sdp(^) under the particular solution X. We denote by Q{U\ the 
unique game Q restricted to the constraint graph G[?7]. 

4 Subsampling theorem for Max-^CSPs 

We will now state our subsampling theorem for fc-CSPs and, as direct application, obtain subsampling 
theorems for basic semidefinite relaxations of ^-CSPs. To state the theorem we need a notion of density of a 
^-CSP. For 2-CSPs we will use the standard notion of density in a graph. Specifically, we will say a 2-CSP is 
l^-dense if every vertex has 0(A) neighbors. For /:-CSPs when k> 2 & natural generalization is to demand 
that after assigning k - \ out of k coordinates in each constraint, there are still 0(A) constraints remaining. In 
this case we say that the ^-CSP is A-dense. 

Theorem 4.1. Let e > 0, A > 1. Let P be a t^-dense k-CSP in n variables over an alphabet of size q so 
that \P\ < 1 for all P eP.Putd> log(q)/Afor some absolute constant C. Suppose U Q [n] is chosen 
uniformly at random so that \ U\ = 6n. Then, 

\Bopt{Pu)-opt{P)\^s. (2) 

The formal density condition and the proof of this theorem are given in Section 8. We instead proceed to 
discuss the appUcations of this theorem. 
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4.1 Subsampling basic semidefinite programs 

The above subsampling theorem for ^-CSPs can actually be used to give a general subsampling theorem for 
basic semidefinite programs. A semidefinite program is called basic if it can be written as a 2-CSP <3 in « 
variables (over infinite alphabet) of the following form: 

0pt(<3) = max E Pij{[ViJaem,{Vj,h]hem) (3) 
',;e['i] 

where Q = {Pij}ije[n] so that each P,y is a continuous function satisfying a Lipschitz condition in the inner 
products {vj^a, Vj,h)- Here the maximum is taken over a bundle of R vectors |f;,- ,j) per variable / € [n]. We 
further require that each constraint on the vectors involves only vectors from the same bundle {vi^a}ae[R] (such 
as Wvi^alf < 1 or 'Zasm = !)■ We also assume that \Pij\ < 1. 

It is crucial here that the maximization is over a product space of n coordinates. Each coordinate 
corresponds to one vector bundle |y;,a)ae[R]- Still we cannot yet apply our subsampling theorem, because 
each coordinate is maximized over a continuous space, i.e., {B'^^)^. However, using dimension reduction 
as in [RS09], the dimension of the vectors can be assumed to be poly(l Is) without changing the objective 
value by more than an e/2. Once the dimension is small we can discretize the space by an e'-net (for small 
enough e') changing the inner products again only by e/2. Hence we have the following lemma. 

Lemma 4.2. Let Qbe a t^-dense 2-CSP of the form (3). Then there is a 2-CSP Q' with alphabet size at most 
2Poiy(i^) such that \opt{Q) - opt(<3')l < £■ 

This shows that we do have a strong subsampling theorem for any basic semidefinite program: 

Corollary 4.3. Let Q denote a basic semidefinite program. Assume Q is t^-dense and let e > 0. Then, 

|Eopt(<af;)-opt(Q)| <£, (4) 

where U Q \n\ is a randomly chosen set of size e~^nl l^for sufficiently large C > 0. 

Proof. After applying Lemma 4.2, we can use Theorem 4. 1 to conclude the claim. Note that the alphabet 
size of 2P°'y(^''^^ translates into a factor poly(l/e) in sample size. ■ 

We will next demonstrate that both BasicSDP for /c-CSPs and the Unique Games SDP are in fact basic 
relaxation of the above form and therefore have a strong subsampling theorem. 

For the Unique Games SDP this is immediate after changing it from a minimization problem to a 
maximization problem. (We can simply multiply the objective by -1.) Note that the SDP relaxation for 
unique games corresponds to a dense 2-CSP if this is the case for the constraint graph. We remark that the 
same is true for the difference of two dense Unique Games relaxations and this is the case that will be used in 
the proof of our main theorem later (Section 7). 

More generally, the same can be done for the BasicSDP relaxation of any ^-CSP. Raghavendra [Rag08] 
defined BasicSDP for a k-CS? P = {P\,.. .,/',„) with \P,\ < 1 over the alphabet [R] as the program 

max E E Pt{x) 

fe[m] x~ii, 

subject to the constraint that Vrx~ti,{xi - a, xj = b} = {vi^a, Vj^h) for all t e [m], i, j e Yai(Pt), and a,b e [R]. 
The maximum is taken over all ensembles {vi^a) of unit vectors and 7?*^ tuples of variables fif, each of which 
is required to be a probability distribution on Var(Pf). Let violate(0 denote the sum of | Prx~//,{x; = a, xj = 
b] - (vi^a, Vjj:,)\ over all /, j € Var(P() and a,b € R. While the constraints of [Rag08] is that violate(0 = for 
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all t e [m], we follow [RS09, StelO] that replaced this with the constraint E(g[,„] violate(?) < e and showed the 
two programs are approximately equivalent up to poly(e) perturbation of the instance. As shown in [StelO], 
because there are only a few (R^k^) constraints per pay-off function Pf, we can introduce this penalty function 
into the objective function, adding the term - Efg[„,] violate(0/e into the expression we maximize. Hence, for 
our purposes we may assume that BasicSDP has the form in (3) so that our subsampling theorem applies. We 
stress that this approach can only work since there are a few constraints for each pay-off functions of P. The 
approach breaks down in the presence of constraints that involve arbitrary combinations of variables, such as 
^2 triangle inequalities. In this case it is no longer possible to assign a meaningful penalty to each constraint. 

In the case of BasicLP similar arguments apply. BasicLP is the same as BasicSDP except that we don't 
require the probability distributions to be realized as inner products of vectors. Two distributions fxp and //p/ 
are however required to be consistent whenever they share a variable. These constraints can be written in the 
objective function and this results in a 2-CSP to which our subsampling theorem applies. 

Application to property testing positive definite matrices. Our subsampling theorem also applies to 
quadratic forms and this can be very useful. We illustrate one application in the context of property testing. 
Specifically, we will get a property testing algorithm for the class of positive semidefinite matrices. Let us say 
that a matrix B is s-far from positive semidefinite definite if there exists a vector x with ||;c||oo < 1 such that 
-£ > (x, Bx) = Yjij bijXiXj. Recall that B is positive semidefinite if and only if (x, Bx) > for all x. Notice 
we could have defined distance in terms of the operator norm which is to say that there exists an x with 
||x||2 < 1 such that (x, Bx) < -e. However, since every vector x of Euclidean norm 1 also satisfies ||x||oo < 1, 
this would only be a stronger notion of "e-far" thus applying to fewer matrices. Note that the expression 
inax;c:||jc||„<i(x, Bx) is a 2-CSP to which we can apply our subsampling theorem (after discretization of the 
domain.) This lets us distinguish between matrices that are positive semidefinite and those that are e-far from 
a small subsample. Formally, we get the following corollary. The simple proof is omitted. 

Corollary 4.4. Let B by a matrix with ||B||oo < Djn^. Then there is a property testing algorithm ^ such 
that: If B is e-far from being positive semidefinite, then J{ rejects B with probability greater than 2/3. If 
B is positive semidefinite, then ^ rejects B with probability less than 1/3. Furthermore, ^ reads only 
poly(D, e~') many entries of B and runs in time poly(D, e~') 

5 Max Cut in random geometric graphs 

In this section we discuss the application of our theorem to solving Max Cut in random geometric 
graphs. Let us first recall some basic facts. The value of the maximum cut of a graph G is given by 
opt(G) := maxjcgj_ij|n(x, ^/4L(G)x). Here L{G) denotes the combinatorial Laplacian of G. The Goemans- 
Williamson [GW94] relaxation for Max Cut is sdp(G) = max [i/4L(G) • X | X > 0, V/: X„ = l) . Note that 
opt(G) and sdp(G) range between and 1 , the total edge weight of a normalized graph. We will consider 
relaxations obtained by adding valid constraints to the above program. A specific set of constraints we'll be 
interested in are the triangle inequalities which can be expressed by adding the constraint Xij+Xj^-Xtu < 1 
and Xij + Xjk + Xik ^ - 1 . for every /, j, k e V. The relaxation including triangle inequalities will be denoted 
sdp3(G). 

Sphere graphs. We denote by Gy the graph on the vertex set V = S'^'^ with edge set £ = {(u,v) € 
y X y I ^\\u - ulp > 1 - y) . The integral value of Gy, denoted opt(Gy), is defined as the maximum of 

p(A,A) yu^({(x, y) e E: X e A,y ^ A]) taken over all measurable subsets A c S"^'"^ Here, yu denotes the 
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uniform surface measure of the sphere S'^ ' and fj.^ = fi x ju. A theorem of Feige and Schechtman shows that 
the maximum is attained for any hemisphere. 

Theorem 5.1 (Feige-Schechtman [FS02]). Fix y € [0, 1] and consider the graph Gy. Then, the maximum of 
//(A, A) over all measurable subsets A c S''^^ is attained for any hemisphere H c S^"' . 

Recall, if A is a hemisphere, p{A, A) = 1 - 0( ^Jy). Hence opt(Gy) = 1 - 0( ^Jy). At this point we mention 
that the SDP relaxation for Max Cut is well-defined on infinite graphs though we omit the formal details. 
In this case it is easiest to think of £ as a distiibution over edges so that the SDP maximizes the quantity 
^{u,v)~E \ \\f{u) - f{v)\\^ over all embeddings f:V—>B satisfying the usual additional constraints. Here B 
can be taken to be the unit ball of the infinite dimensional Euclidean space. 

The sphere graph itself can then be interpreted as an SDP solution, hence the following fact. 

Fact 5.2 (Basic SDP). sdp(G^) > l-y. 

Proof. The graph itself gives an embedding (the identity embedding) such that for each edge (u, v) e E, 
^\\u - v\\^ > 1 - y. Since the SDP averages this quantity over all edges in the graph, the claim follows. ■ 

We will show next that triangle inequalities change the value of the SDP from 1 - y to 1 - n( ^Jy) thus 
capturing the integral value up to constant factors in front of y. 

Lemma 5.3. sdp3(Gy) < 1 - Q( ^/y) . 

This lemma was quite possibly previously known, but we will give a proof in Section B for lack of a 
reference. Using standard discretization arguments all previous lemmas can be transferred to a sufficiently 
dense discretization of the continuous sphere. Similarly, it is not difficult to show that sufficiently many 
random points from the sphere will give a good discretization. 

Lemma 5.4. Fix y e [0, 1], <i e N. Then, there exists an no(d, y) €fi so that if we pick V c 5'^"^ uniformly 
at random with \V\ ^ no, then the induced subgraph Gy[V] satisfies (1) opt(Gy[V']) = 1 - 0( ^Jy), and (2) 

sdp^{Gy[V]) = \-e{^Jy). 

The proof is given in Section B. It is worth noting that the proof of the previous lemma gives a very weak 
bound on the number of vertices that we are required to subsample. In particular, it is not difficult to see that 
the average degree of the graph will be n^^"^^\ A priori, it could therefore be the case that the SDP value 
changes when considering a subsample of the sphere with average degree log(?i) or even 0(1). Indeed, [FS02] 
show that for some fixed y, a random subsample of the sphere of expected degree C?(log n) will satisfy most 
triangle inequality constraints with high probability thus exhibiting some integrality gap for sdp3 However, 
our main theorem in this section implies that asymptotically sdp3 behaves like 1 - ^Jy rather than 1 - y. 

To argue this, we'd like to use our subsampling theorem for unique games. It is well known how to 
express the max-cut problem on a graph G as an instance Q of Unique Games where the constraint graph is 
exactly G. Since we defined unique games to be minimization problems, this corresponds to minimizing the 
number of uncut edges. We therefore have that opt(G) = 1 - opt(^) and furthermore it is well known that 
sdp(G) = 1 - sdp(^) for the basic SDP relaxation and also sdp3(G) = 1 - sdp3(^) where the latter refers to 
an SDP relaxation for Unique Games that includes triangle inequalities, yielding the following theorem: 

'In their example the angle between two neighboring vertices is chosen to be more than 60 degrees corresponding to very large y 
to which our theorem does not apply due to the constant factor loss in 7. 
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Theorem 5.5. Fix ye [0, 1] and let A > poly(l /y). Fix d and choose n such that for n uniformly random 
points V c S''"' the induced graph Gy[V] has expected degree A. Then, 

sdp3(Gym) = i-0(Vr)- 

Proof. We think of Gy[V] as a unifonn vertex subsample of a random dense discretization Gy[W] in d 
dimension. Note that by Lemma 5.4 we have sdp3(Gy[W]) = 1 - 0( ^fy). We can reduce Gy[W] to a unique 
game Q so that sdp3(^) = 0( ^Jy). Now Gy[V] corresponds to the unique game Q[V], since the constraint 
graph of 0[V] is precisely Gy[V]. By Theorem 7.2 (subsampling theorem for Unique Games), we know 
that sdp3(^[V']) = 0( -\/y). Note that triangle inequalities correspond to a reasonable relaxation. But then it 
follows that sdp^{Gy[V]) = 0( Vr)- ■ 

Theorem 1 is a corollary of this theorem, since sdp3 can now be used to certify that random geometric 
graphs have small max-cut value. 

6 Negative results for subsampling 

In this section we first observe that its is impossible to obtain even a weak subsampling result for the 
semidefinite programming relaxation of ^-CSPs with k ^ 3. This results follows from Schoenebeck's 
integrality gap [Sch08]. We also argue that even in the case of 2-CSPs subsampling is impossible when the 
constraints are not unique. 

Second, we give a separation between semidefinite programming and linear programming by showing 
that a subsampling result for linear programming is impossible even in the case of Max Cut and Unique 
Games. Here, our results are based on the integrality gap construction of [CMM09]. 

6.1 No subsampling for SDP relaxations of fc-CSPs with k> 3. 

Theorem 6.1. There is a k-CSP P with Q.{n^) constraints in the variables [n] so that sdpQ(^^^(P) < 0.51, but 
with high probability sdpQ^Q^^{P[U]) ^ 0.99 where U Q [n] is a random set of size 5n with 5 > c/n^''^'' for 
some constant c. 

Proof sketch. We may take "P to be a random dense instance of ^-XOR. It is known that an SDP with a 
constant number of rounds of Lasserre captures the integral value of the CSP. Now P[U] is a ^-XOR instance 
with Q.{6^n'^) = Cn constraints for some constant C. For large enough C , the result of [Sch08] then implies 
the claim. ■ 

6.2 No subsampling for SDP relaxations of non-unique 2-CSPs 

The above result also shows that we cannot hope for a subsampling theorem for semidefinite relaxations 
of non-unique 2-CSPs. Indeed, we can take a dense instance P of 3-SAT and express it as a 2-CSP P' as 
follows: Every constraint P eP gets mapped to a new variable xp over the alphabet [8]. Each label represents 
an assignment to the original constraint. Every two constraints sharing one variable in P contribute one 
constraint P' e P' which enforces that the assignment to the shared variable is consistent. 

Subsampling variables in P' corresponds to subsampling constraints in P. Using [Sch08], the subsample 
of P will be a gap instance for the Lasserre hierarchy. Since our reduction is local, ideas of [Tul09] show that 
also the subsample of P' will be a gap instance. This rules out the possibility of a subsampling theorem for 
non-unique 2-CSPs of alphabet size 8. 
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6.3 No subsampling for LP relaxations of 2-CSPs 



In this section we rule out subsampling theorems for strong linear programming relaxations even in the case 
of Max Cut for which strong semidefinite relaxations do admit a subsampling theorem. Specifically, we 
consider the Sherali- Adams LP relaxation for Max Cut: lp,.(G) = max Yj(u,v)eE ^uv over (u, v) s.t. the vector 
{xuv)u.Dev lies in the Sherali- Adams relaxation of the cut polytope. 

The Sherali- Adams relaxation of the cut polytope is obtained by applying r rounds of lift-and-project 
operations to the base set of linear inequalities that define the metric polytope, i.e., {xij + xjk > xik, Xij + xjk + 
Xiu < 2, Xij = Xji, 1 ^ Xij ^ 0). For a formal definition see, for instance, [CMM09]. 

The next theorem shows that there are graphs which have Sherali- Adams value bounded away from 1 
for a constant number of rounds. But after subsampling the value comes arbitrarily close to 1 even when 
considering a huge number of rounds. 

Theorem 6.2. For every function s = e{n) that tends to with n, there exists a function r = r{n) that tends to 
oo with n and family of graphs {G,,} of degree D = D{n) such that 

1. For every n, lp3(G„) < 0.8 

2. IfG' is a random subgraph of G of size {n / D)^'*'^''"^ then E[lp,-(„)(G')] ^ 1 - 

where Ip^iH) denotes the value ofk levels of the Sherali-Adams linear program for Max-Cut on the graph H. 

Proof sketch. Let G„ = G„_p for some p ^. It is not difficult to argue that three rounds of Sherali-Adams 
have value at most 0.7 on G = G„ with high probability over G„ itself. This follows by considering triangles 
in G and arguing that every edge in G occurs in the same number of triangles up to negligible deviation. But 
3 rounds of Sherali-Adams have value at most 2/3 on a triangle. Hence, lp3(G) < 2/3 + o{\). 

On the other hand let <5 = |^ where D = pn is the expected degree of G. We observe that G' = G[Vs] is 
exactly distributed like G' = Gmj/m for m = {njD)^^^ and A = m^. Using arguments similar to [ABLT06], 
one can check that such graphs have girth going to infinity, and for some M e oj(1), all subsets size M are 
(1 -I- ?7)-sparse, where 77 € o(l). Hence, we can follow the proof as above and use [CMM09] to argue that 
G{Vs\ has Sherali-Adams value larger than 1 - o(l) for oj{l) rounds, and therefore picking r{n) sufficiently 
small concludes the proof sketch. ■ 

Remark 6.3. We remark that such expansion based arguments can be used to give similar results for 
subsamples of any A-regular graph and in particular for subsamples of the Feige-Schechtman graph. 



7 Proof of the main theorem for Unique Games 

We now come to the proof of our main theorem — a weak subsampling theorem for strong SDP relaxations of 
Unique Games. Let us first formalize the notion of a "reasonable" SDP relaxation. 

Definition 7.1 (Reasonable SDP relaxation for Unique Games). Let V be a set of n vertices and let A1 be a 
convex subset of the set Mi defined as 

Ml = [X e R(^xra)x(^x[«]) I X > 0, V/ € V, e [/?]. X,-,,„ < 1 

For a unique game on a graph G with vertex set V, we define sdpyy^(^) by 

sdpM(^) ^= min E E \\Ua - y;r„^„(«)lP . 
XeM (u,v)eE ae[R] 
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We say that sdpj^ is a reasonable relaxation for Unique Games if M is closed under renaming of coordinates 
and permutation of labels in the sense that 

eM. 

Here, the function F is not required to be bijective, but for every u,v e V, tt^^u is a permutation of [/?]. We 
also say that At is reasonable if it satisfies the condition above. 

For an SDP to be reasonable it is only needed that any set of vectors used for one vertex of the unique 
game can also be used in any other vertex, even after a permutation of the labels. 

The next theorem gives a subsampling result for any reasonable relaxation of Unique Games. 

Theorem 7.2 (Main), Let e > and let Q be a unique game on a ^-regular constraint graph. Then, for 
5- A-i -polyCV^ 

isdp;v^(g^) - e < Esdp^(g^[V5]) < sdp;^(^) + s, 
where sdpyy^ is any reasonable relaxation. 

The theorem is proven in the next two steps. 

7.1 First step: proxy graph theorem via subsampling theorem 

For our first step we'll need a special case of our subsampling theorem for semidefinite programs. It shows 
that under certain regularity conditions subsampling is possible for semidefinite programs that correspond 
roughly to the SDP of a unique game on a regular graph. 

Lemma 7.3. Let e > and let P be a 2-CSP over n variables of the form V{x) = Yjijev bijP{xi, xj) where 
we interpret each variable Xi as a collection of vectors = {vi^a)ae[R\ cmd each pay ojf function is bounded 
and of the form P{xi, xj) = ^^j, da,b{xi,a, Xj,b) ■ Assume that each bij < 1 /An and ^ij ~ &{i/n)for every 
j, Then, for 6 > poly(l/£)/A, 

EoptCnV^]) -opt(!P)±e. 

As shown in Section 4 this lemma can be derived easily from Theorem 4.1. We'll proceed to state and 
prove our proxy theorem. 

Theorem 7.4 (Proxy Theorem). Let Q, 'H be unique games on A-dense constraint graphs and suppose 

sdp(^)[X] > c ■ sdpCK)m (5) 
for every SDP solution X € Mi . Then for 6 > A~'poly(l/£), we have 

E sdpMmVs]) > c • E sdpj^mVs]) - s . 
Proof of Theorem 7.4. Consider the 2-CSP instance, 

max -PiX) = c ■ sdpCHJlX] - sdp(^)[X]. 

XeMi 

Note that by our assumption 

opt(;P) = max PiX) < . 

XeMi 
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Let A{G) and A{H) denote the adjacency matrices of G and H respectively. Since G is A-regular and H has 
degree at least A, we know that each entry of B = cA(H) - A(G) is bounded by 0(1 /An), whereas each 
row/column in B sums up to 0(1). Hence, the matrix B satisfies the assumption of Lemma 7.3. It remains to 
check that in P each pay off function is bounded. This follows from the fact that in both sdpCK) and sdp(^) 
each pay-off function is of the form ^ae[R] ll"a - i';r((;)lP and this expression is bounded since each vector has 
norm at most 1 so that each payoff function is bounded by 0{R). 
Therefore, by Lemma 7.3, 

e > Eopt(!P[n]) = E max csdp(mVs])[X] - sdpigiVsMX] 

XeMi 

> E max csdpCK[ - sdp{g[Vs])[X] (since M Q Mi) 

XeM 

> EmaxcsdpCKLVaDLX] - maxsdp(^[y5])[X] . 

Hence, 

Esdp;vi(^[^<5]) > c • Esdp^CKLV^]) - e. 



7.2 Second step: proxy graphs for unique games 

In this section, we show that taking the "third power" of a unique game results in a useful proxy graph. 

Definition 7.5 (Third power of a unique game). For a unique game G we define G^ to be the unique game 
defined on the third graph power of the constraint graph. An edge e - (u, v) therefore corresponds to a path 
(m, u', v', v) in G. The constraint ;r„<_„ on the edge (u, v) is defined as the composition of the three constraints 
along the path in G, that is 

TTv^u - ^v^v' ° ^v'^u' ° ^u'^u ■ (6) 

Lemma 7.6. Let G denote a A-regular unique game. Then, for every SDP solution X e Aii, 

sdp(G)[X] > 1/9 • sdp(G^)[X] . 

Proof. Let X e Mi and let (u, v) be an edge in G^ corresponding to a path (u, u' , v' , v) in G. Let a e [R] and 
put a' = TTu'^uia), b' = TTu'^u'ia'), and b — 7Tv<—v'{b'). Note that by definition of G , we have n„^u(a) — b. 
By triangle inequalities 

\\Ua - VhW < \\Ua - U'J\ + - v'y,\\ + \\v'^, - Vb\\ 

Squaring both sides and taking expectation over a e [/?], we get 

E \\ua-Vb\^^3 E \\ua-u'^t + 3 E \\u'^,-v'j,,t + 3 E \\v'^,-V},\^. 

ae[k\ ae[k\ ae[k\ ae[k\ 



Averaging over edges in G^ , we get 

E E 

(u,v)eG^ ae[k\ ' (u,v)eE ae[k\ 



E ^ E IK-t;;,_(a)||2<9^ E^ E 



Lemma 7.7. Let Qbe a A-regular unique game on a graph G = (V, E) and let Q be the unique game on a 
graph G = (Vs, E) defined by the edge distribution 
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- sample a random edge (i, j)from G, 

- choose u and v to be random neighbors of i and j in ( if i or j have no neighbor in Vs, choose a 
random vertex in Vs), 

- output {u, v) as an edge in E. The constraint on the edge {u, v) is taken to be the composition of the 
constraints on {u, i), (/, j), (j, v) the same way as in Definition 7.5. 

Then for 6 > A" Voly(V4 _ 

E||G3[n]-G|U<£. 

Here, |HItv denotes statistical distance. 

Proof Sketch. If every vertex of G has the same number of neighbors in Vg, then the two graphs ^^[V^] and 
G' are identical. For 6 > A"'poly(l /e), the following event happens with probability 1 - e: Most vertices 
of G (all but an e fraction) have up to a multiplicative (1 + e) error the same number of neighbors in Vs. 
Conditioned on this event, it is possible to bound ||G^[V5] - G'||tv by 0(e). Assuming this fact, the lemma 
follows. The details can be found in Section A. ■ 

Lemma 7.8. Let Q be unique game on a A-regular constraint graph. Then for 6 > A^^ ■ poly(V£) and for 
any reasonable relaxation sdpyy^, 

E sdp j^ig^ [Vs]) > sdpj^iQ) - s . 

Proof. Let Q be as in Lemma 7.7 and let X be an optimal solution for sdpy\^(^). 

Let f{Vs) denote the distribution over mappings F : V ^ Vs, where for every vertex / € V, we choose 
F{i) to be a random neighbor of / in Vs (and if / has no neighbor in Vs, we choose F{i) to be a random vertex 
in Vs). For convenience, we introduce the notation N(i, Vs) for the set of neighbors of / in Vs (if / has no 
neighbor in Vs, we put N{i, Vs) = Vs). For each 7^ ~ TiVs) we define a decoded SDP solution ^f{X) for @. 
Specifically, the entry corresponding to /, j € V and labels a' ,b' e [/?] 

1. Let F{i) - u and F{i) = v. Assigning the label a' to / forces j to have label b' = nj^iia') and hence u 
and V must have labels a = nu^i{a') and b = n^^jib'). 

2. Define _ _ _ 

■^F(X)ia'Jb' '■= Xua,vb = ^F(r>„^,(a),F'0');r„^/ff^^,(a)) • 

Since M is reasonable (see Definition 7.1), we have J?If(X) e M for any mapping F : V ^ Vs. 
We define Jlf(Vg){X) '■= Ef~t{Vs) ^f{X). Since M is convex, we also have 

J{riv,){X)= E ^HX)eM. 

F~T(Vs) 

We claim that 

sdp(^)[j?lr(v,)(X)] - sdp(^)[X] . 
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Indeed, 



sdp(P)[^r(y.)W] = 2 E E :HrivdX)ia'Jn,^,{a') 
ij~Ga'e[R] 



It follows that 



= 2 E E E E Xii„ (a'),vn„^i{ni^i{a')) 

u'v'~G ae[R] ueN{i,Vs) veN{j,Vs) 

= 2 E E E E X^,a,v7r„^,{a) (using (6)) 

n'«'~G ae[R] ueN{i,Vs) veN{j,Vs) 

= 2 E_ E Xua,v7T„^u{a) 
uv~G 

= sdp(^)m . (7) 



sdp;^(^) < sdp^(^)[y[r(y,)(X)] (using Jlr(Vs)(X) e M) 

= sdp^(^)m (using (7)) 

= sdp^(^')- (8) 
We can now finish the proof of the lemma, 

Esdp;vi(^'[^5]) > Esdp;vi(^) - 0{\)m\GHVs] - G'IItv 

^ sdpyy^(^) - £ (using (8) and Lemma 7.7) . 



7.3 Putting things together 

By combining the previous two steps we can prove Theorem 7.2. 

Proof of Theorem 7.2. We need to show that 

isdp^(^) - 0(e) < Esdp^(^[V5]) < sdp^(^) . 

The upper bound on E sdpy\^(^[V5]) is easy to show. We consider an optimal solution X e M for Q. Note 
that the value of X is preserved for Q{Vs\ in expectation, i.e., 

Esdp;vi(^[^5]) < Esdp^(^[n])m = sdp^(^)[X] . 

We combine the lemmas in this section to prove the lower bound. Notice that by Lemma 7.6, we can 
choose - (for c - \ /9) in Lemma 7.4. With this choice of TY, we can finish the proof of the theorem, 

E ^dp^iOWs]) ^ ^ • E &dpj^{{g^)[Vs\) - E (using Lemma 7.4) 
> g • sdpyv((^) - -^e (using Lemma 7.8) . 
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8 Proof of subsampling theorem 



In this section prove our main subsampling theorem for fc-CSPs. We will work with the following notion of 
density. 

Definition 8.1 (density). We say that a ^-CSP V is A-dense if l^l < 1 for every P and furthermore for 
every r e {k\ and fixing of ^ - 1 variables I = . . , ir-\, *, ir+i, ■ ■ ■ , ik), we have 

^ PeP : Var(P)=/ 

for some absolute constant c > 0. 

Here, A is a parameter in [l,n]. The larger A the denser the instance. In a 2-CSP A corresponds to the 
degree of each variable. In a dense ^-CSP [AdlVKK03] we have A = &{n). 

Theorem 8.2. Let e > 0, A > 1. Let P be a ji-dense k-CSP in n variables over an alphabet of size q. Put 
5 ^ \og(q) / A for some absolute constant C. Suppose U c [n] is chosen uniformly at random so that 
\U\ = 5n. Then, 

\Eopt{Pu)-optiP)\<£. (10) 

Remark 8.3. In the case k = 2, our notion of density reduces to the usual notion of density in a graph. We 
get the optimal trade-off between density and sample size in that case. When ^ > 3 there are fc-CSPs with 
n^~^ constraints that do not allow sparsification. For instance, consider a dense /:-CSP in which all constraints 
share the same variable. We cannot subsample here, since we would likely lose that variable and hence all 
constraints. 

Throughout the proof we will think of k as an absolute constant and consider any function of as 0(1). 
We will also assume that coordinates in [n] are sampled with replacement. 
One direction of the theorem is immediate. 



Lemma 8.4. 



max "Pui^) 



^maxrix). (11) 



Proof. Suppose x* € [q]" maximizes P{x). Note that E [^maxvg[5]n Pu(x)] > E [!Pf/(x*)] . On the other hand, 
E!Pj/(x) = TO- ■ 

The other direction requires all the work. We will split it up into two main lemmas. The first lemma 
shows that the subsampling step is random enough to give a concentration bound for large subsets of [g]". 

Lemma 8.5 (Concentration). There are constants cq, c\ so that for 5o = e~'^° log(g)/A and randomly chosen 
U Q {n\ of size \ U\ > Son we have that for every subset W c [q]" of size |^| < exp(e^'|?7|), 

<£. (12) 

We think of Son as the smallest sample size for which we can expect concentration. The previous lemma 
shows that the maximum value of any fixed set of exp(poly(e)|?7|) assignments is preserved when sampling 
U of size larger than Son. 

The second main lemma shows that this concentration bound is actually good enough for us. Indeed, the 
maximum of the subsample turns out to have enough redundancy so that we can find a suitably small set of 
assignments in [q]" that captures the optimal value of the subsample up to a small error. 



E 


max'P(/(x) 


- max P{x) 




. xe'i' 
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E 


max fjjix) 


< E 


max!Pj/(x) 


+ e 




xe[qY 




. -veT 





Lemma 8.6 (Structure). For every constant c there is a constant C and a set of assignments *F c [q\" of size 
\V\ < exp(e'^5?i) where 6 = e^^ log(^)/A such that for randomly chosen U of size \ U\ = 6n, we have 



(13) 



Together these two lemmas direcly imply the main subsampling theorem as shown next. 

Proof of Theorem 8.2. In one direction, let T be the set from Lemma 8.6 which we obtain for c = ci where 
c\ is the constant from Lemma 8.5. Let C be the constant given by Lemma 8.6 for the given choice of c. Then 
with 6 = e"*' log(g)/A, we have 



E 



max fuix) 

xe[qY 



< E 

< E 

< E 



max 'Pr/(x) 



+ £/2 



max !P(x) 

.ve'F 



max ^{x) 



+ E 



+ e. 



(by Lemma 8.6) 
(by Lemma 8.5) 



The other direction follows from Lemma 8.4. 



8.1 Proof of Concentration Lemma 

Fix a vector x e [q\" ■ We will first analyze the case where we sample sets Ui, U2, . . . ,Uk Q [n] independently 
at random of size Sin {61 is some parameter that we'll instantiate later) and we keep all constraints whose rth 
variable is contained in Ur- Later we will be able to conclude the case where U\ = U2 = ■ • ■ - Uk- 

The argument proceeds in k steps. At each step / we restrict the f to those constraints whose r-th variable 
is contained in Ur- After each step we perform a pruning operation in which we remove variables whose 
influence has become too large. We then argue that the pruned CSP has the desired concentration properties 
and moreover that pruning doesn't remove to many constraints in expectation. 

Denote by fx the CSP obtained from f by throwing away all predicates whose first variable is not 'mU\. 
Then normalize by a factor <5~\ since we expect to remove a 61 fraction of the predicates. Formally, 

Now, let Inf,(!Pi) - Y^PePi -. ;eVar(P) 1^1 denote the influence of variable / in Pi. Let A'^ = EInf,(!Pi) = 
0(1 /«). As mentioned before, we will throw away all predicates that contain a variable whose influence in Pi 
has become too large, say, larger than 2N, 

={PePi\ V/ € Var(P) : Inf,(!Pi) < 2A^) . 

We think of this as the pruning of Pi. Continue this process, inductively, by putting 

Pr = {6~'P\ Var,(P) eUr,Pe P^T; ) , 

and 

PT' ={PePr\ V/ e VarP: Inf K!Pr) < 2''N} . 

Here Inf,(!Pr) - Zpe:p,.: ;eVar(P) 1^1- Note that^P^™ will still have maximum influence at most 0{l/n). 

In the following, when we write Prix) we think of it as normalized in the same way we normalize Pu{x), 
i.e., by a factor l/\P\. 
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Lemma 8.7. For every x e {q\'\ 

Pr (inCx) - !P(x)| + ts] < 0{ex^{-t^£^6in)) (14) 

Proof. The proof proceeds in k steps. At each step we will apply a variant of Azuma's inequality (sometimes 
referred to as McDiarmid's inequality) given by Lemma D.3. 
Specifically, we claim that 

Pr [-PX^c) > r'^Tiix) + te] < 0{&x^{-t^£^6in)) , (15) 

for every < r ^ k. We define the mapping 

friUr) = '^l ''^^^^ = 

where we think of U,- as a tuple (/i, . . . , is^n) each coordinate being an index in [n]. Note that 



E 



PeP\._^ 



We claim that has Lipschitz constant 0(1/ 6 in) in the sense that replacing any coordinate / € U,- by a 
/' e [n] can change the function value by at most 0(1 /din). This is because the influence of each variable in 
!P|;™7 is at most 0(\/n). 
Lemma D.3 then implies 

Pr {/, > E/, + te) < exp ^— = zx^\^-a(t^i?5in)) . 

This is what we claimed in (15). By a union bound, (15) holds for all r e Hence, we can chain these 
inequalities together and get 

Pr{m(^) - !Pi| > te) < 0(exp(-f2e2(5i?i)) . 



We'd like to argue that at every pruning step only a few predicates get removed and hence p^™"" and Vr 
are close. Specifically we'd like to show that the influence of / has enough concentration so that it is larger 
than twice its expectation only with small probability. This directly gives us a bound on the expected amount 
of pruning. The key observation is the next lemma which shows that the degree of each fixing / of ^ - 1 
variables is concentrated. 

Lemma 8.8. Assume 5i > \/e^K,fix I = (ii,. . ., ir-i, *, ir+i, ■ ■ ■ , h) and let Q = [P € P \ 'Var(P) = I}. Then, 

SiA- 1^1 (16) 

PeQ: Var,(P)e{/,- 
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Proof. By the density condition on we have Y^PeQ \P\ > ^(A) and |P| < 1. (In particular, |<3| > Q(A).) 
Consider the random variable 

Pe<3,Var,.(P)e(7, 

which sums the norm of all predicates in Q that are selected by Ur- Let = EZ = (5iA. We can express Z as a 
sum of independent variables Z = ^'^^ Z,, where Z,- is the outcome of the /-th sample in U. Since we sampled 
with replacement, the Z,'s are independent and identically distributed. Every Z, assumes each value \P\ for 
P € Q with probability 1 /n. We note that EZ,- - ^ J^PeQ \P\ = ®(f )■ Let us compute the fourth moment of 
Z - EZ. First observe that E(Z,- - EZ,)"^ < 0(E |Z,f ) and 



Similarly, for any / k: 



n 

PeQ 



0(1) V- ,„,2,.„2 . 0(A2) 



E(Z; - EZifiZk - EZkf < 2] l^l^l'^'l^ ^ 



P.P'eQ 



By independence and the fact that E(Z; - EZ,) = 0, we therefore have 

E(Z - EZ)"^ = ^ E(Z; - EZ,-)"* + 6 2] E(Z,- - EZ;)^ E(Z^ - EZ^)^ 

< 5\n ■ + {6iny ■ — — 



n rp- 



Thus, by Markov's inequality. 



P.(|Z-EZ|>„.^<^.^. (.7) 
Therefore we can bound E |Z - EZ| in expectation by integrating (17) over t ^ I, 

[ t-Pri\Z-EZ\>ts^i)dt< [ f^^dt<^ [ Ut<s (18) 

for ^ larger than c/e^, i.e., <5i ^ c/e^A. m 
Lemma 8.9. For every < r < k, 

2] \P\<s. (19) 

Proof. We'd like to bound 

^ 2] |P| <2 2]|Inf,(n)-EInf,(n)l (20) 



PePAPT' '=1 



in expectation over Ur by 0(e). We will bound E|Inf,(!Pr) - EInf,(!P,.)| for every fixing / that includes 
variable / and fixes all but the variable in position r. Indeed fix / = (/i , . . . , , - ■ ■ , h)- Let Q = {P e 
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Pr I Var(f ) = I). We will bound the expected gain of influence of variable / in Q. By linearity of expectation 
this will give us a bound on (20). Let Z - Zpe(3|Var,-(P)ej/, 1^1- By Lemma 8.8, 

|EZ-Z| < eEZ. 

Since we have this bound for every fixing and these fixings form a partition of f, we find that after 
renormalization, we have 

E|Inf;(!P,)-EInf/(!P,)l < -■ 



Let us denote by f',- the CSP that is obtained in the exact same way as fr except without the pruning step. 
In particular, f'^ is simply the CSP f in which the r-th variable is restricted to Ur for each re \k\. 
The next corollary summarizes what we have shown so far. 



Corollary 8,10. Let ^ c [qf of size QX^{Q.{e^6in)). Then, 



E 



maxKCjc) 



- max !P(x) 



< s. 



(21) 



Proof. We first note that £ "P'u and we can get 



E 



\'P\ 



This follows from repeated application of Lemma 8.9 (with sufficiently small value of e) for each re [k]. In 
particular this shows that 



E max PUx) - E max 'P^(x) 



< £/2. 



On the other hand, by Lemma 8.7 and the union bound over x € *P, we get that 



E 


max Pk(x) 


- max P(x) 









(22) 



(23) 



Here we used the fact that the probability that the maximum deviates by f • e drops of exponentially in t so 
that we can integrate over f > 1 to get a bound on the expectation. Thus, 



max'p:(x) 



max P(x) 



which is what we wanted to show. 



We are now ready to prove the first main lemma. The proof reduces the general case to the case where 
each coordinate is subsampled independently as previously dealt with. The idea is to partition the set of 
variables into m bins and only consider predicates whose variables fall into k distinct bins. The total weight 
of the remaining predicates can be neglected for large enough m. 
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Proof of Lemma 8.5 ( Concentration). Partition [n] randomly into m bins, i.e., [n] = 5 1 U ^2 U • • • U S„„ with 
m = 0(l/e^) (say). Furthermore, let Ue = U D S ( for £ e [m]. One can show that with probability 1 - o(l), 
for all r € [k] we have \U,-\ € [^\U\, ^JU\]. 

For a given P eP the probability that there are /, j € Var(P) and j e [m] so that i e S i and j e 5 f is at 
most 0{e^). Hence, we can throw away all such P and lose only an O(e^) fraction in expectation. 

On the other hand, for every u € [mf with pairwise distinct coordinates, we let f"* = {P ef \ 'ir : Var,- € 
SuX For every !P" we may then apply Corollary 8.10, since ?7„j, Uu2, ■ ■ ■ , Uu, are independently chosen. We 
apply the corollary with e' = sjm'^ = poly(l /e). This requires us to choose do large enough as a function of e 
so that the previous lemmas (in particular Lemma 8.9) apply to subsets of size \ Ui-\- This allows us to sum 
the error over all applications of the Corollary for a total error of e. The Corollary applies to sets *F of size 
exp{Q.{e^\Ur\)) = exp(poly(e)|?7|) which is what we needed. ■ 

8.2 Proof of Structure Lemma 

Proof Idea. The main idea is the following. We have a subsample U of size 6n. Hence, maX;cg[^]n 'P(/(x) is 
a maximization problem in dn variables. In particular the maximum is achieved by one of roughly 2'^'°^^'*^" 
assignments to these variables. The whole problem is that we need a set of assignments ^ of size 2P°'y*'^^'*" «; 
2^" with the property that one of the assignments in ^ is near optimal with respect to !Pf/. 

The proof strategy is to design a deterministic algorithm D{y) that is given a seed y e [q]^ where S Q U. 
The algorithm returns an assignment x = D(y) to the variables in U with the guarantee that for some seed 
y e [q]^ , the induced assignment x = D{y) is near optimal in Pu. An important parameter is the seed 
length of D, i.e., the size of 5. It is also crucial that the algorithm does not know U but only S and Ps ■ 
(Otherwise the algorithm could trivially return an optimal assignment for Pu-) Specifically, we want to 
achieve seed length \S \ < poly(e)5i?i/ log(<7). This will suffice for the purpose of our proof, since then we can 
put *F = {D{y) : y € [q]^ }. In this case *P will be sufficiently small. 

The key point in the proof is to choose U so large that for every x € [q]" both !Pj/(x) and Psi^) are a 
good approximation of P{x). This fact will be the main reason why we can hope to obtain a near optimal 
assignment for Pu by just looking at Ps . We remark that this proof strategy is due to [GGR98]. Formally we 
will prove the next lemma. 

Lemma 8.11. For every constant c, there is another constant C and a deterministic algorithm D : [q]^ [q] ^ 
which extends an assignment to the coordinates S to an assignment to the coordinates in U so that 

E max Pu{D{x)) > E max Pv{x) - e . 

Here the expectation is taken over random U Q [n] of size 6n = s'^d^n and random subset S Q U of size 
\S\^E~'Snl\og{q). 

Once we have this lemma it will be easy to conclude the Structure Lemma. We will next describe our 
algorithm and then prove Lemma 8.11. 

Deterministic greedy algorithm. Let a = s'^' / log(^), the factor by which S needs to be smaller than U. 
Assume a fixed paitition of U into m pairwise disjoint sets U = Ui U U2 ^ • • ■ ^ of equal size. Here 
m is some parameter that we'll need and determine later. Choose S e uniformly at random from U\Uc of 
size 15^1 = -\U\ for some parameter a. Let S = 5 1 U • • • U 5,,,. Note that 151 = a\U\. We want \S c\ ^ 6on so 
that the concentration lemma will apply even to the sets S e. We take m - poly(l /e), e.g., m - e~^*^ will be 
sufficient. Hence, - is some fixed polynomial in e and this determines the size of U. 
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The algorithm D works as follows. 



Input: X e [q]^ 
Output: z€[q]^ 
Algorithm: 

- For every / € 5, we put Zi = Xi. 

- For every € e [m], do the following: Let y* e [q]^'"^^ denote the partial assignment that maximizes 
the function f(y) = Ps[(^[y]) where x[y] denote the assignment which is equal to y for all 
coordinates / € Ue\S and equal to x elsewhere. Formally, let 

y* - arg max PsMlu]) (24) 

and put Zi = y* for every / € Ue\S. 
This defines an assignment z to all coordinates in U. 



Analysis. To analyze our algorithm, let x* e [^]" denote the assignment that maximizes maxvg[^]" Vuix). 
Our goal is to define a sequence of "hybrid" assignments a:^, . . . , ;c'" where ^ = x* and x'" = D(y) for 
some ye [q]^ so that in expectation over U and S, we have 

'Pu{^")>'Pu{x'^)-£. 

The sequence is defined as follows. Let x^ = x*. Inductively, let a:'^ for < ^ < m be equal to x^~^ in all 
coordinates except Uf. The coordinates are induced from 5 ^ exactly as in our algorithm in equation (24), 
i.e., for all / e U[\S, we let x': = y* where y* is defined as 

y*^arg max f 5f(/"^[?/]) - 

We observe that indeed x'" is the generated by D for some x € [q]^ since all coordinates in x"' are induced by 
looking only at coordinates in S (though it need not be the case that x'" = D{x*)). Now, denote the enor at 
step £ by 

err(0-!P[/(/"^)-!Pc/(x^). 
Note that err(^) is a random variable depending on both U and S . The claim now reduces to showing 

E2]err(0<e, (25) 

since by definition Puix*) - "Pui^") < I.ee[m] err(0 • 

In order to argue (25), it will be convenient to consider 

8rr(0 = Pu\uM^~^) - Puwrix^) . 

Since we chose m large enough and hence Uc is a. sufficiently small fraction of U, it follows that for all 
e e [m], 

2] err(^) < ^ S?f(0 + ^ . (26) 

ee[m] ie[m] 
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Let 

z* ^ arg max PuWrix^'^y]) ■ 

Note that here we are maximizing over Pu\Uc rather than Psr The following lemma gives us a concrete way 
of bounding 5t(^). 

Lemma 8.12. 

S(^) < l-PuwA^^'^h*]) - -PsM^'^lzDl (27) 

Proof. If (27) were false, then we would have 

But this is a contradiction, since we chose y* as the maximum with respect to Psc- ■ 

The next lemma shows that the RHS above is small in expectation. The reason is that 5 f is chosen 
uniformly at random inside U\U(. Let x = x^^^[z*]. We have EPsri^) = PuWci^)- We only need to argue 
that the average deviation of Psri^) from its mean is small. This can be argued directly but it also follows from 
Lemma 8.5 applied to Pu\Ur *F = {x}. To apply this lemma, we actually need that 'Pu\Uc is sufficiently 
close to being sufficiently dense. This is true in expectation over U. 

Lemma 8.13. 

'E\Pu\uA^)-'Ps,ix)\<^ (28) 

Proof. As mentioned before we think of as a subsample of Pu\Uc- We would like to apply Lemma 8.5 
(Concentration) to conclude the claim. However Pu\U( need not satisfy the density condition. However, by 
Lemma 8.8, Pu\U{ does satisfy, for every fixing / of ^ - 1 variables. 



E 

u 



J] \P\ - <5A 

/'eP[;\(/f,Var(P)=/ 



< e'SA . (29) 



In other words, every fixing / satisfies the density requirement in expectation. We can therefore treat Pu\Ue 
as a 5A-dense CSP and subsample S c Q U\U{ from it. Note that we can take 6 A = poly(l/£) arbitrarily large 
so that we may subsample an a fraction of the variables of U\U( and expect error s/4m in the application of 
Lemma 8.5. The fact that Pu\U[ satisfies only (29) leads to additional approximation errors in the application 
of Lemma 8.5. By summing (29) over all possible /, we can bound these errors by s'. Again taking 6A large 
enough we can assure e' < e/4m. Hence, we get a total expected error of s/lm which is what we wanted to 
show. ■ 



Combining Lemma 8.12 with Lemma 8.13, we conclude that for every { e [m]. 
Finally, 



Eefr(0<-^. (30) 
2m 



E err(f) < E J] S(0 + ^ (by (26)) 

ee[m] ee[m] 



= ^ Eerr(0 + 



s s 

<m - — + - (using (30)) 

2m 2 

= s 
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We can now complete the proof of the Structure Lemma. We would like to put 'i'(S) = {D(x) \ x e [q]^}. 
Then, by Lemma 8.11, 

E max fr/Cx) > E max Puix) -e. (31) 

.ve'P(S) xe\_q]" 

We are not quite done, since the Structure Lemma requires a single fixed set ^{S). So far we are choosing S 
randomly as a subset of U. Hence, the set *F(5 ) that we constructed above depends on the choice of U. To 
finish the proof we need a single set T' c [qY that is independent of the choice of U. This is easy to accomplish 
from what we have. Simply pick S and U independently and consider U' = UuS. Since |S | < poly(e)|?7|, we 
can make the difference between fui^) and Pu'M negligible for any x € [q]". Therefore, we may exchange 
U' for U in the previous argument so that the choice of S and U is independent. Since (31) is then true in 
expectation taken over independent S and U, there must also exist a fixed choice of S for which (31) is true 
in expectation taken over U. But now we may take *F = ^(S) in order to conclude the proof of the Structure 
Lemma. 
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A Edge distribution of subsample of the third power 

In this section we compare the edge distribution of the subsample of to a somewhat nicer distribution. This 
step was needed in Lemma 7.7. In the following let G = (V, E) be a A-regular graph and 5 > poly(e~^)A~' . 
Further denote W = V^. 
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Lemma A.l. Let Di denote the uniform distribution over edges in G^[W]. Let D2 denote the distribution 
obtained as follows: 

L Pick a random edge {v, v') e E. 

2. Choose uniformly at random w e Nw(v) and w' e Nw(v'). 

3. Output {w,w'). 
Then, 

E[TV(Di,D2)] <e. 
w 

Here and in the following TN{D\,D2) denote the total variation distance between the two distributions Di 
and D2. 

Proof Let us compare the following two distributions: 

Pi : Pick a uniformly random path p - (w, v, v',w') from the set of all paths of length 3 in G which have 
w, w' e W. 

P2 : Pick a random edge v,v' e E and random neighbors w e Nw(v), u/ e Nw{v') and consider the path 
{w, V, v' , w'). 

Notice that it suffices to bound the statistical distance between Pi and P2. This is because Di is just the 
marginal distribution of Pi on the endpoints of the path (w, w'). Likewise D2 is the marginal distribution of 

P2 on {w, w'). 

Now, let p - (w, V, v', w') denote any path of length 3 in G so that w,w' eW. Let denote the nimiber 
of such paths. Note that "EN - d^A^n. Let us now compare the probability of this path under the two 
distributions. For Pi we get 

On the other hand, under P2, 

P2ip)-^-^-^. 

Note that for every i; e V, we have E lA^H'Ct')! = It now suffices to argue the bound 

1 1 

2 ^ 



\NwmNw{v')\An 



E[TV(Pi,P2)]=Ei2 

p 

Let us call a paffi p - (w, v, 1/, w') good if 

1 \±£f 



< E. (32) 



\Nw{v)\ ■ \Nw{v')\ <52A2 • 

Later we will choose e' = Q.{s) to be sufficiently small, say, s' - e/100. We need the following simple 
concentration bounds. 

Claim A.2. With probability 1 - e' over the choice ofW, we have 
L N-^ = (1 ± s')/i6^A^n) . 
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2. The fraction of bad paths is less than l/0(e'^((5A)^) . 



Proof. The first claim follows from Lemma D.l. Regarding the second claim, it is not hard to show for every 
V, v' that 



Pr 



1 



1 ±e' 



1 



\Nw{pWw{v')\ 52 A2 j 6>(e'4(5A)3)' 

This can be shown by computing the fourth moment E(|A^vi'('^)l - (^A)"* and bounding the probability of a 
factor 1 + a deviation of \N^n{v)\ from its mean for small enough a = Q(e')- This argument shows that the 
expected number of bad paths is at most 1 jOis'^idS)^) and the claim is completed by applying Markov's 
inequality. ■ 

Given this claim, we can finish the proof of the lemma. Indeed letting Q denote the set of good paths, we 
have with probability 1 - e'. 



1 



1 



^ (52A3« ^ ^ A« 

pee ptQ 



< Is + 

= 2e' + 

< O(e') , 



A^ J_ 
0(£'5(M)3) ' An 
1 A^ 



In the first inequality we used the fact that \Nw(v)\ > 1 for any existing path and hence the term 
^l\^w{'^)V^wii{v'y\An is never larger than 1/A?i. In the last step we used that we may choose 5 A > Ce'~^ for 
sufficiently large constant C > 0, and that A^ < (1 + E')5^t^n. Hence, 



ETV(Pi,P2) < (1 -e')O(e') + e' < £■ 



B Details on random geometric graphs 5 

In this section we will in the details that were left out in Section 5. We start with the proof of Lemma 5.3. 

Lemma 5.3 (Restated). sdp3(Gy) < 1 - n( -(/y) . 

The proof works as follows. First, triangle inequalities are known to imply the odd cycle constraints 
which means that an SDP with triangle inequalities on an odd cycle of length k has value at most (and, in fact, 
equal to) 1 - 1/^. 

Lemma B.l. Let C be an odd cycle of length k. Then, sdp3(C) < 1 - 1/^. 

Second, it follows that if a graph G can be covered uniformly by odd cycles of length k, then its sdp3 -value 
can be at most 1 - I Ik. 

Lemma B.2. Let G - {V, E) be a (possibly infinite) graph. Suppose there exists a distribution C over odd 
cycles of length kfor some fixed number k such that the marginal distribution on each edge of a random cycle 
from C has statistical distance s to the uniform distribution over edges in G. Then, sdp3(G) <l-l/^ + e. 
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Proof. By our assumption we have that for every embedding / : V ^ B, 

E \\\m-mt<.^^ E l\\f{u)-f{v)t + e. 

{u,v)~E C~C (u,v)~C 

But we know, by Lemma B.l, that for every /: V ^ B, satisfying the triangle inequalities, 

(u,v)~C 4 

Hence, 

E \\\f{u)-f{v)t <.l-\+s. 

(u,v)~E 4 

■ 

We will next see that the sphere graph can by uniformly covered by odd cycles of length 0(1/ ^Jy). We 
begin with the following simple observation. 

Lemma B.3. For every / € [1 - y, 1 - 7/2], there exists an odd cycle, denoted Ci — (vi,. . . , Vk), in Gy of 
length k = 0{ ^fy) such that ^\\vi - = I for all i e {\,. . . ,k - 1). 

Proof sketch. Pick an arbitrary great circle around the sphere and place the vertices vi,. . .,vi; equally spaced 
along this circle. For k = 0{ ^fy) vertices, we can accomplish the Euclidean distance between two consecutive 
vertices is less than, say, Afy/lO. Now connect each vertex v on the circle to the unique vertex w which 
maximized ||t;-u;|p. This creates an odd cycle and, by our previous observation, it follows that > 1 -y. 

Now we can make ^||f; - w\\^ = Z be walking along the cycle and moving vertices in a direction orthogonal to 
the plane defined by the circle until all edges have length /. ■ 

Lemma B.4. Let y > and let S'^~^ be the sphere. There exists a distribution C over odd cycles C = 
{vi, . . . , Vie) for some k < ^ such that for all i, the marginal distribution of(vi, Vi+i) has statistical distance 
o(l) to the uniform distribution over edges in Gy (as d ^ oa). 

Proof. We will describe the distribution C as follows: 

1. Pick a random edge e = {u,v) € E from Gy. 

2. Let I = - v\\^. If / < 1 - y/l, let C/ - {vi,V2, ■ ■ ■ , vt) denote the odd cycle given by Lemma B.3. If 
I > I - y/2, declare "failure". 

3. If the previous step succeeded, pick a random rotation R and output RC = (Rvi , Rvj, - - - , Rvk)- 

We claim that if the second step succeeds, then indeed every marginal (Rvi,Rvi^i) is distributed like 
a uniformly random edge. This is (1) because (u, v) was chosen to be a uniformly random edge and (2) 
{Rvj,Rvj+i) is a random rotation of (u, v) and hence, by spherical synometry, is equally likely to be any edge in 
E that has the same length as (u, v). 

On the other hand, by measure concentration, with probability 1 - exp{-Q.{d)), we have that - v\\^ e 
[1 - 7, 1 - y/2]. This completes the claim since the probability of failure only introduces o(l) statistical 
distance. ■ 

In this section we give some details on how to obtain a dense discretization of the Feige-Schechtman 
graph. 
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Lemma 5.4 (Restated). Fix y € [0, 1], € N. Then, there exists an n(j{d, y) e N so that if we pick V Q S 
uniformly at random with \V\ ^ n^, then the induced subgraph Gy\y^ satisfies (1) opt(Gy[V']) = 1 - 0( ^Jy), 
and (2) 



Proof sketch. The first claim is shown in [FS02]. For the second claim, let us decompose S^~^ into equal 
volume cells of diameter at most e. Here, e is a parameter that we will later take to be very small, say, 
s < J/ 100. Now pick enough vectors V Q S^~^ uniformly at random such that with probability at least 1 - e 
every two cells have the same number of vectors up to a factor of 1 ± e in it. 

We need to show that sdp3(Gy[V]) < 1 - Q( ^Jy). To this end we first consider a related graph G', which 
has the same vertex set as G but different edges. A random edge in G' is defined by the following process: 
Pick first a random edge on the continuous sphere, then for each endpoint pick a random vertex in the equal 
volume cell containing the endpoint. Finally, normalize the edges such that the total edge weight is the same 
as in G. 

We can use the distribution over odd cycles given by Lemma B.4 in order to get a distribution for the 
graph G' as follows: Pick the cycle and map each point to a vertex in the corresponding cell. The resulting 
marginal distributions will be uniform in G'. Thus, sdp3(G') = 1 - 0( -\/y). 

Finally, we will show that E||L(G') - L(Gy[y])||Tv tends to zero with e. That is, the two distributions have 
statistical distance tending to zero. This also shows that for sufficiently small £, the semidefinite programs 
also have approximately the same value. Now to argue the above point, consider the process of picking a 
random edge. Consider first the case that in G', the two cells containing the chosen points have exactly the 
expected number of vectors in them, and furthermore, suppose that the two cells are good in the sense that 
either none of the vertices in them share an edge or all pairs of vertices between the two cells share an edge 
in G. In this case, the edges in G going between these two cells have exactly the same probability as under G'. 

The first assumption is close enough to the truth, since the number of vertices in different cells differ by 
at most a factor of 1 ± e. For the second assumption it suffices to pick e small enough so that a cap of radius r 
has the same volume as a cap of radius r ± e up to a factor of 1 ± o(l). This happens for, say, e «; l/d. This 
will guarantee that the number of bad pairs of cells is small. This argument can be found in [FS02]. ■ 

C Subsampling edges 

In this section, we will briefly discuss the analogue of our main theorem in the setting where we sample 
a fraction of the edges in G at random so that the expected degree in G is constant. Here, G = {V,E) will 
always denote a A-regular graph on n vertices. Our proof in the case of edge subsampling is much simpler. 
As it turns out it suffices to bound the cut norm between the original graph and its subsample and to argue 
that the SDP value is a Lipschitz function of the cut norm. The latter fact is a consequence of Grothendieck's 
inequality. 

We let Es Q E denote a random subset of E of size 6\E\. We'U overload notation slightly by using G[Es\ 
for the graph G restricted to the edge set Eg. 

Definition C.l. The cut norm of a real valued nxn matrix A is defined as 



sdp3(Gym) = i-0(Vr)- 




(33) 
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It is known that the cut norm is within constant factors of the norm 



llooH^i ^ max 

Xi,yje{-\M 



i,je[n] 



A natural semidefinite relaxation of (34) replaces every pair x,, yj by two unit vectors m,, Vj, i.e., 

sdp(-(A) = max aij{ui, vj). 
Il«,ll=ll",il=l 



(34) 



(35) 



A theorem of Grothendieck bounds the gap between the cut norm and its relaxation by a multiplicative 
constant (the Grothendieck constant). 

Theorem C.2. There is a constant Kg (known to be less than 1.8j such that sdp(-(A) < A'g||A||oomi- 

The next lemma shows that the cut norm between a graph and its subsample is small. 
Lemma C.3. Let 6 > ce^^A-i. Then, E ||A(G) - 5'^ A((}\Es\)\^^^ < e. 

Proof. We can show that with probability 1 - \{x,Ay) - 5~'^{x,A!ij)\ < s simultaneously for all x, !/ e 
{-1,1}". The proof follows from Hoeffding's bound and the union bound. The details are straightforward and 
therefore omitted from this paper. ■ 



Similarly the following lemma can be shown. 
Lemma C.4. Let 6 > ce-^A"!. Then, E ||D(G) - 5-^D{(}\Es\ 



lloo— >1 



< e. 



The previous two lemmas showed that the expected difference in cut norm between the graph G and its 
edge subsample G\Ei'\ is small. 



Corollary C.5. For 6 > cg-^A-', we have E \\L{G) - 5'' LiGlEsMc < £ ■ 

It turns out that bounding the difference in cut norm is sufficient for bounding the difference in SDP 
values. 

Lemma C.6. Let G and G' be any two graphs on n vertices. Let M £ JVij ( see Definition 7.1) be any set of 
positive semidefinite nxn matrices. Suppose \\L(G) - L{G')\\c < t. Then, 



|sdp;,^(G)-Sdp;,^(G')|<O(0. 



Proof. 



|sdp^(G) - sdp^(G')| 



max(L(G)-L(G'))»^ 

XeM2 

< 0(1) • ||L(G) - L{G')\\c 
^0(t). 



(by Theorem C.2) 



Corollary C.7. Let G denote a A-regular graph and let 6 ^ poly(I/e)A ^. Then, 

E \sdpj^iG) - sdpj^iG[Es])\ < £ 

for any M £ Mi- 



(36) 
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Negative results for linear programs. We remark that using the approach of [CMM09] one can obtain 
strong and general results ruling out subsampling for linear programs. 

Theorem C.8. Let e,A> 0. Suppose G is a {^.-regular graph with A > rfi. Then with high probability over 
G' - G[£'^/a], after removing o{n) vertices, Ip^(G') > 1 - e/or r - n" where a(l/e, 1/0, /^) tends to zero as 
any of its arguments grows. 

The proof follows by arguing that G[£'^/a] has sufficient small set expansion so that [CMM09] applies. 
Details are omitted. 



D Deviation bounds 



Deviation bounds for submatrices. The following general lemma is useful in bounding the deviation of 
expressions Yji,jes M/yl when S denotes a random subset of \n\ and A is a « x « matrix. 

Lemma D.l. Let A denote a symmetric nxn matrix such that an = Ofor all i e [n]. Suppose there is some 
/? > such that \aij\ < /3for all i, j € [n] and Y^j < 1 for all i. Now, let S c [«] denote a random subset of 
\n\ of size dnfor some 6 > (3. Then, for all s > 0, 



Pr 



iJeS 



> en 



0(1) 

E^5n 



(37) 



Proof. Denote by Xjj the random variable which is equal to atj when both / € S and j e S and is zero 
otherwise. Let ju,y = EX,y = S^aij. Putting X = Yji,je[n] ^ij ^nd = EX we will compute the variance of X. 
The key fact that we will use is that the selection of /, j and k, I is independent unless either i = k or j = I. 
Pairs where neither is the case will not contribute to the variance. More precisely, 



E 



V 'J 



= E 



^ (X,y - Hij){Xkl - ^Ikl) 



2 E (Xtj - fiijf + 2] E {Xij - pij) {X,j - 



'j 



= J]0(6^)al + Y,0{d^)aijakj. 

ij ijk 

At this point notice that Z/. a^. is maximized when in every row we have 1/yS entries of magnitude yS in 
which case the expression evaluates to - Likewise the second expression Yiijk ^ij^kj is maximized 
when in every column j e [n] we have 1 /p nonzero entries of magnitude y6. In this case the expression is 
{lipfP^n = «. Hence, 

cr^ = E (X - pf < Oid^/3n) + 0{6\) < 0{6^n) , (38) 
where we used that 5 > p. Hence by Chebyshev's inequality, 

9 0(1) 



This is what we claimed up to scaling. 
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In the proof of the proxy graph theorem we used the following simple observation relating the Laplacian 
of a subsample L{G[Vs]) to the corresponding principal submatrix of the Laplacian L{G)v^. 



Lemma D.2. Let G be a A-regular graph and let H be a graph of degree at least A. Let 6 ^ poly(l/e)A ' . 
Then, 

nL{GWs\) - L(,G)v,h, ^ s . 

Proof. By inspection of the two matrices we see that the difference in the entries of the matrix is due to 
irregularities in the degrees of G[y5]. Specifically, the matrix L{G)vs has diagonal entries equal to l/6n. On 
the other hand, the i-th diagonal entry of L{G[Vs]), call it di, is equal to 'LjeVs '^U- ^^^^ that E(i; = ^ 
and we claim, 

di - — < £ . 



isVs 



5n 



This can be derived from Lemma D.l. 



McDiarmid's inequality. We also needed McDiarmid's large deviation bound (sometimes called Azuma's 
inequality). 

Lemma D.3. Let Xi,. . . , X,„ be independent random variables all taking values in the set X. Further, let 
f : /Y"' ^ R fte a function ofX\ , X^ that satisfies for all i, xi, X2, . . . , x^, x'. ^ X, 

\f(^X\, . . . , Xi, . . . , Xm) ~ fiXl, . . . , Xf, . . . , Xffi)\ ^ C;. 

Then, for all t > 0, 



]Pr{|/-E[/]|>f}<2exp 
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